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Learning Principles 


VES: IEAA ARE: AË (id: redstonewill) 


aR BT TM a FS (REST A— Validation, REENER 
BD: Deraint Dea, —EBOVFAM BFS REE AWA, AAA EASIE 
RAIS, MAREE AIRE, SCUBA ZIRE. MTR, RIEZ 
SMAN FS RERAN 


—, Occam's Razor 


R-RE (Occam's Razor) , SA 4HASie SR SASSER 
AYEXBE (William of Occam, #912855F2213495F) Et. X-Fi (Ockham) Et 
ZAFER ABATE. fh (mS Bit) 21min WA RARSRAA 
{HFSS ARREST LHS. IX NRE UAE, DASA” (Entities 
must not be multiplied unnecessarily) , iURBIJI—E, SARVEA nA, 


Occam's RazorbRENH ZS SS, AENA BeA IRh, RIZ 
ERBERI REC AEH E+ el Ae. 








LEAmE— MRAR, AOA Ri, TRADI: MAWATE 
WIFFE, MARIAREN. (Be, RAAF, CR 
A, HAAKE MARDI. MERE RA Nae: NEARE 
RAY? NEAT fe RELL Se ee? 


RARE FS te) Ahypothesis h, fajEAIhypothesisit ate tees (AATF 


AECL, PRERANE. SPREA SA ete HB A byhypothesis 
SEAR, TRAZ, XPS PRLASWAA. 







simple model H 





simple hypothesis h 
e small Q(h) = ‘looks’ simple + small Q(H) = not many 
e specified by few e contains small number of 





hypotheses 





parameters 





sr, simple hypothesis h#Jsimple model H@AABKAAY. Rhypothesish SEA 
HEI, BAH AAAThypothesis AmE, thai, hypothesi, 
HPhypothesis E thm), 


PALA, A RE, RJJ- Aea Sie Amodel, RAA 
regularization, \Ehypothesis FE% eum, ABRELHMRBE AES 


BAT At SARE EE THB SHNRR Reh. MeAIB RR 
WM”, BUSTERS, AARAA. ikea EAERI 
MLAJŠE RRNA, SGXKEI—-MRH, wea Ei, = 0, ERER, ARDRE 
AIPA, (SURE RSRARE, bao. RIK, MRAA— 
HAGE, WRAL DRE NRR AESSR, MERASA MZ 
SAREE. ERENLER, BENA AIRMAN EAR 
ALAS. ARERR RARE ao Lea IT, RARESA A SAL 
ett, TAB, WERAMREAAIRE AEDT, FARE RULES NEET 
TE, PARERA, HATLARA, SARL ARES 
Fo IMEN FIREREN. PTA, RAS, RIMASE 
HARE, PINE ERAI ATR 


—,. Sampling Bias 


HASADA: 1948F ŽE RAR IGAL ee A Trumani 
Dewey, XIR MEJAK, tM A sTrumanik&eDewey, ITA 
BABAE, RAADeweyf R Akki A Truman EKZ, PARRI AE 
eR RAZA, AUAA J “Dewey Defeats Truman”AYR4RSLAk, TA 
JDewey SEMS. (ABARAT, OXRRAAKR A, Truman] T 
Aca BEF. 


ATASHiRBaRT TERRA Re? BAAR Tea HiRes 


MF? ve. BSSAASMAIALR, BRIAN AKEee>, MIRCEA 
AB Asi Deweyk, MIR AARISAISaTrumanthEARs, eine Ais 
(RAT ARAAR, RRA ZAC, MERDewey ki RE SARE. 


XNE, FHEERSHFA SSR SIZE, A-O lf the data is sampled in a 
biased way, learning will produce a similarly biased outcome "BBE, RHEE 


ZANE, PAFDANARIHET tz, KPA AME Sampling Bias. 


MEALS, MEVAGA ERRA, Raa TG 
AY, KAASEN 7 Re BUTE CARTE. 


=, Data Snooping 


CRATER, EA MaAS RA ZR eA made, AAKHS PRAIA 
JRA FERRE, VERT aS. ALA, SMIKRB AH, mee 
7A, MSTA FET. 


FLE, SAREE MARS, MANER JEE T IRESE. AR Ele? 
HX, SMEARED E, MENAATI, ARR 
ITERAR EARR, A T F Amodel complexity, EREÆ5IA TIS 


Bm MIF RIB. RURAS FSAA, Ria ix ee PEK 
E, RIUSCIRE. RARR FSUE AVAE, FR FANN 
ANE, RYA., DUE TARI20ANSGe, RILAR, RIEA 


WREIDE. 


snooping 


20 


Cumulative Profit % 





no snooping 


DEA Paty RAIA, MORI, AEA HSE TRA AR, fa 25F 


Sua FAMIn, APReRA ATS FN, Bie Bees Eau T 
RAVI, BPA EHER FANS. EA, RATE, AFRA 
‘TIARA a 2A PUA eK, ERE. (BERE ERRERA 
A, AVARAS T AF, FIXER eles eS 
PASH. KEHAGA EE FEMRAT. BRAM MAGEAT ARB 
EMRIGE, FARRER ARREA. 


e snooping: shift-scale all values by training + testing 
e no snooping: shift-scale all values by training only 


AA MaRBGENGIF, EOS ME RAGESED, RANCE NRE 
H1, #RER TCX. BSA SBMX IOI, MEWD, BIZ—- SATAY A ee 
H2. ix, FETA A AM RIARIOI, BIZARRE. ASL, mA Re 
MN, BARRA SINS, BAAR. ERBER Di ARIA 
AY, (ASSAM ARERIARRE!, AE ESR, BERETS 
overfittinge#@bad generalization, ATLA, Hla FS MRAxXHDAS BAA you 
torture the data long enough, it will confess.”ARLA, FRAJA REN SAE’ EQIHI SE T , 
ANE REZ’ TASE. 


VES IIE, WA ARSE IRS, mE, SWIM. Sin 
BER, A-E ADU eee. B- ike eh W Sue. 
MENARI MERA AA, REAR ARAARA, ANS 
WE. Tt, BAUE. BONARE MS. METAIRIE 
AMICK R ES See, BITE CARR SMT, 
HET BES 2) LE BOE AARC. 


e be blind: avoid making modeling decision by data 
e be suspicious: interpret research results (including your own) by 


proper feeling of contamination 
Pq, Power of Three 


WS, BAT CAMS BASS, FSAI ET. AAR PEAT 
ARE RAS AX. 


B, RIMAT ie ae FSS Na: 


e Data Mining 
e Artificial Intelligence 


e Statistics 


Data Mining Artificial intelligence 1 Statistics 




































e use (huge) data + compute e use data to make 
to find property something that inference about 
that is interesting shows intelligent an unknown 

behavior process 

e difficult to « ML is one e statistics contains 
distinguish ML possible route to many useful tools 
and DM in reality realize Al for ML 





BaAliASMA T ENERE: 
e Hoeffding 


e Multi-Bin Hoeffding 
e VC 


Hoeffding Multi-Bin Hoeffding 
















P[BAD] 
< 2exp(—2c7N) 


P[BAD] 
< 2Mexp(—2c7N) 


P[BAD] 
< 4m,(2N) exp(...) 










e alH 


e useful for 
training 


e one hypothesis 


e useful for 
verifying/testing 


e M hypotheses 


e useful for 
validation 






AE., FMI IMAT SPAR : 


e PLA/pocket 
e linear regression 


e logistic regression 


PLA/pocket linear regression logistic regression 


h(x) = sign(s) h(x) = 6(s) 





















friendly err = squared 
(easy to minimize) 
minimize analytically 


plausible err = CE 
(maximum likelihood) 
minimize iteratively 


plausible err = 0/1 
(small flipping noise) 
minimize specially 


IAT, FaMIMaT SPATE: 


e Feature Transform 
e Regularization 


e Validation 










Feature Transform ‘ Regularization Validation 


En(w) — Ein(w) En(w) —> Ein(Wrec) | En(h) > Evai(A) 
Qc(H) > vc(He) | dc(H) > deelH, A) H > {97,---,9u} 


e by using more » by augmenting e by reserving K 
complicated ® regularizer Q examples as Dya 

e lower Ein e lower err e fewer choices 

e higher dvc e higher Ein e fewer examples 


ARIA IRIMAN= MRT: 
e Occam's Razer 


e Sampling Bias 


e Data Snooping 






Occam's Razer Sampling Bias DE-\e-meyalele)e)iare 
simple is good class matches exam | honesty is best policy 









Bn, RIRKA ESAE: 
e More Transform 


e More Regularization 


e Less Label 





More Transform 





More Regularization Less Label 


bagging decision tree support vector machine peyral network kernel 
AdaBoost 299°292ti0" sparsity 2uloencoder coordinate descent 


dual uniform blending deep learning nearest neighbor decision stump 


kernel LogReg large-margin Prototype quadratic programming SVR 
GBDT PCA random forest Matrix factorization Gaussian kernel 
k-means OOB error RBF network probabilistic SVM 


soft-margin 





By Bg 


ATRESIA SHS I=SB SWB BW it: Occam's Razor, Sampling Bias, 
Data Snooping, X (HLEBSUVHA) REA YANA SARA SA 
Aix HESS, “SNR hati T SRLS USA. 


BM SURO RESICSEcE! BRIS oA Eo, W 
ay! 


EBB: 
NSP eRRYAERSASMHH Hes SURa) RE 


